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In the data mining and machine learning (ML) discipline, feature selection 
problem is considered among many researchers in the recent times. Feature 
selection process targets to minimize feature set number and maximize 
performance accuracy by identifying optimal features. Multiple objectives 
are considered while identifying the optimal feature hence multi-objective 
metaheuristic optimization algorithms (MOMOAs) are applied. In this study, 
literature review is performed MOMOAs-for solving wrapper based feature 
selection problem (WFS). The literature review for solving WFS problem 
and discuss the challenges faced by the researchers in solving the feature 
selection problem. The literature review is performed on all relevant studies 
published in the last 12 years [2009-2022]. A detailed overview of the 
feature selection preliminaries, MOMOAs-WFS, role of the classifier in 
feature selection problem are presented. The outcome of this literature 


review is to highlight the existing works related to WFS problem using 
MOMOAs. Finally, the research areas for improvement are identified and 
emphasized for the scientists to survey in the field of MOMOAs. 
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1. INTRODUCTION 

A massive number of data are involved in all the real time problems were managing the data 
becomes extremely complex and noticeable process. The dataset consists of vast amount of features or 
attributes and the dataset contents does not contain usable information. Some attributes or features can be 
unrelated, redundant that reduces the performance of the model. It is always recommended to minimize the 
dataset size while maintaining the performance accuracy is the goal of the feature selection problem. The aim 
of the study is to solve the challenging feature selection problem by applying machine learning (ML) 
techniques with the help of multi-objective metaheuristic optimization algorithms (MOMOAs). For example, 
if there are n number of features in a set, then totally 2n subsets are possible from that the optimal subset is 
chosen. It is complex when the ‘n’ size is huge in number and the evaluation model for each subset is chosen. 
To manage these kind of situation, various search techniques such as exhaustive search, random search, and 
greedy search are applied to solve the feature selection problem and chose the optimal subset. These 
techniques have drawbacks such as complexity, premature convergence, maximum computational cost and 
time. Hence, MOMOAs are used to handle these kinds of conditions. This literature survey about 
multi-objective meta-heuristic optimization algorithms developed in the last 12 years [2009-2022] on various 
applications to solve the wrapper feature selection (WFS) problem. 
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The various applications involved in feature selection problem are image and text mining, 
bioinformatics, computer vision, medical, and industrial applications. The features selection enhances the 
classification accuracy by selecting a wide range of appropriate features by eradicating the unrelated and 
repetitive features thus reducing the dimensionality of the data [1], [2]. Feature selection is considered as an 
NP hard problem consisting of 2n stares consisting of ‘n’ features. The problem’s complexity is enhanced 
when the N size is increasing daily. These kinds of features extraction and selection approaches are 
considered such as principal component analysis (PCA) [3] and linear discriminant analysis (LDA) [4]. A 
new feature is produced from the original feature by minimizing the search space with the help of functional 
mapping process. The two goals of feature selection process is to maximize the classification performance 
and minimize the number of features. Multi-objective optimization algorithms aids in selecting the features. 
This literature review provides the up-to-date work related to the feature selection in multi-objective 
perspective, discusses the challenges and forthcoming scope of the work. The main contributions of this 
study are given: i) the basic concepts of feature selection problem definition, search technique, evaluation 
measures and multi-objective metaheuristic algorithms are elaborated, ii) a detailed survey on the 
multi-objective metaheuristic algorithms for feature selection are classified and listed, iii) review on the WFS 
using meta-heuristic algorithms are presented, and iv) research gap is identified and suggestions are given for 
future work to improve the research on WFS. 

The paper is structured as follows: section 2 presents the preliminary details of feature selection 
problem such as vital definitions, search techniques and evaluation measures. Section 3 discusses the 
multi-objective metaheuristic algorithms for solving WFS. Section 4 illustrates the role of classifiers in 
feature selection. Section 5 presents the conclusion and scope for future work in WFS approach. 


2. FEATURE SELECTION PRELIMINARIES 

This section describes about the feature selection definition, mathematical model of the feature 
selection problem and the concepts of feature selection. In ML techniques, feature selection is considered as 
the most essential pre-processing step. The model performance can unfavorably affect the features that are 
irrelevant or redundant [5]. In case of irrelevant feature, the exactness of the model can be reduced [6]. The 
original feature is attained from the subset by choosing suitable featureis referred as feature selection [2]. The 
various advantages of feature selection are: i) decreasing the redundant and over fitted data aids in decision 
making easier, ii) the precision is enhanced by reducing the misleading data, and iii) minimizes the time, data 
points, algorithm complexity and quicken the training of the algorithm. 

Feature selection can be mathematically framed as follows. Let us assume that a dataset ‘S’ with 
features denoted as ‘d’. Related features are selected among‘d’ features with dataset S={fl,f2,f3,...fd}.Ideal 
subset of feature from ‘S’ is selected. The subset D={fl,f2,f3,...fn}where n<d and f1,f2,f3,...fn represents 
the attributes. The overall feature selection process working mechanism is that there is a dataset with whole 
featureset. Feature selection algorithm aids to extract the feature subset, then based on the selection criteria 
the results are validated. The five elements of the feature selection process are original dataset, feature subset 
selection, evaluation, selection criteria and validation. The three categories of feature selection are filter, 
wrapper and embedded methods [1], [2], [7], [8]. The filter methods are independent and it focuses on the 
overall characteristic of the data [9]. The wrapper method comprises of classification algorithm and interacts 
with the classifier. It is expensive than the filter method and provides accurate results compared to filter 
method. Hybrid methods combine the filter and wrapper approaches. The training process is part of the 
classifier and this method uses the learning algorithm and it’s considered as the wrapper method [10]. Wrapper 
method obtains better results than the other method and the wrapper method depends on the modelling 
algorithm for every subset that is generated.The various search strategies are used for the wrapper methods. 
Jovic et al. [11] came up with different search approaches in random, sequential and exponential categories. 

The size of the feature increases exponentially with the number of features evaluated. Accurate 
results are produced in this approach but it’s impossible to apply due to high computational cost. Exhaustive 
search, branch and bound method are few of the examples [12], [13]. The features are added or removed 
sequentially in the sequential algorithm category. Once the feature is added or removed from the subset it 
cannot be changed that causes local optima. Linear forward selection, floating forward or backward selection 
are few of the sequential algorithms [14]. Random algorithms explore the search space randomly. These 
algorithms do not get trapped in the local optima. Simulated annealing, metaheuristic algorithms, random 
generations are few of the random population based search approaches. 

The vital factors of feature selection problem are search technique, number of objectives and 
evaluation measures. The bio-inspired algorithms like genetic algorithm (GA) [15], [16], particle swarm 
optimization(PSO) [17], [18], ant colony optimization (ACO) [19], [20], and grey wolf optimizers (GWO) 
[21]—[24] are various efficient techniques used to solve the feature selection problem. The various limitations 
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of these techniques are getting stuck in the local optimal and high computational costs. Many single objective 
techniques were adapted hence multi-objective techniques for solving the feature selection problem was 
introduced by multi-objective (MOGA) [25], [26], MOPSO [27], MOGWO [28], [29]. 

The wrapper and filter approaches are grouped generally and the subsets of features are evaluated for 
classifiers. Wrapper method is expensive by considering computational cost and filters. Produces better results 
considering the performance of filters for classification. Other researchers classify the feature selection methods 
into filters, wrappers and embedded methods [30]. Embedded combines classifier and feature selection in one 
process [7], [31]. In the multi-objective approach, aims in coding the Pareto frontier solutions over the other 
solutions produced by single objective problem [32]. In the solution group of non-dominated solution consists of 
subset of all solutions that have all feasible decision space. The boundary is set of all points mapped by the Pareto 
optimal set [33]. An optimal feature selection process is formulated by identifying the key attributes of the set and 
the relationship between the data classes. MO can be used to overcome the challenges [34]. The minimization 
problem multi-objective function is mathematically represented as: 


Minimize F(x) = [f1(x), ROE), ...., RKE] (1) 
Subject to gi(x) <0,i1= 1, 2, 3, ..., m (2) 
h(x) = 0,i= 1, 2, 3,...,1 (3) 


Where f(x) is a function of x, i denotes objective functions number and the constraint functions are gi(x) and hj(x). 
This feature selection review focuses on the use of wrapper method using random algorithms and its 
method especially all metaheuristic algorithms are reviewed. In particular, multi-objective optimization 
algorithms are reviewed. The swarm intelligence-based algorithms, physics-based algorithms and human related 
algorithms are the various kinds of metaheuristic algorithms present in the literature for various applications. 


3. MULTI-OBJECTIVE META-HEURISTIC OPTIMIZATION ALGORITHMS AND ROLE OF 
CLASSIFIERS IN WRAPPER FEATURE SELECTION 
This section discusses few of the various method, experimental results and the findings of 
MOMOAs-WFES. Table 1 illustrates the search technique, evolution metrics and the multi-objective idea of 
all the research studies related to WFS. 


Table 1. Literature review related to method 


Publication Search method Evaluation metrics Objectives Results and findings 
[35] MOFS-BDE Wrapper and k-nearest Attribute number and MOFS-BDE is superior than 
technique neighbors (KNN) error classification existing DE, PSO, GA, ABC and 
minimization MOEA methods at 0.05 level 
[36] MO-ABC algorithm Wrapper method and Minimization: attribute Numeric and binary version of 
KNN number and error MO-ABC is performed and the 
classification results outperform NSSABC 
[37] MO-Bat algorithm Wrapper method and Minimization: attribute MOBA is superior performance 
KNN, SVM number and error than the existing 
classification 
[38] MOFS Wrapper method, Attribute number and MOFS rank is superior and the 
Rank linear SVM error classification | LETOR datasets were used 
minimization 
[28] MOGWO algorithm SVM, Wrapper Minimization: attribute MOGWO and MOFA results are 
method number and error processes and superior in terms of 
classification accuracy and feature reduction 
[39] MOGA algorithm Wrapper method and Minimization: attribute MOGA provides few features and 
KNN number and error accuracy rate is more compared to 
classification single target GA 
[40] MOGA (NSGA-II) Wrapper method and Attribute number and MOGA (NSGA-ID is superior to 
algorithm SVM error classification AUC and provide better 
minimization classification accuracies 
[41] MOUFSA algorithm Wrapper method Minimization: attribute MOUFSA is superior than 
k-means and KNN number and error MOFSA1, MOFSA2, FMOFSA 
classification 
[42] Deep Belief network Wrapper method, deep Attribute number and The proposed result outperforms 
belief network reconstruction error baseline method 
[43] Deep Boltzmann Wrapper method and Minimization: attribute The results demonstrate the 


Deep Boltzmann 


number and 


reconstruction error 


proposed approach by selecting 
features without reducing the 
accuracy 
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This section discusses the various classifiers and the performance of these classifiers using wrapper 
approach. Table 2 illustrates the classifiers using the wrapper approach. Figure 1 illustrates the various 
number of classification approaches used. SVM classifier is used maximum number of times in the previous 
studies. 


Table 2. Classifiers using wrapper approach 


Publication Classifier Description Performance 

[44] SVM Hyperplanes for large scale Performance of SVM is good in terms 
dimensional space are built of accuracy. It’s computationally 
supervised learning expensive 

[45] SVM The classifier is used for reducing the Performance of SVM is good in terms of 
generalized error accuracy. It’s computationally expensive 

[46] KNN Used for supervised learning. Scans to Best performance in dealing with 
find the nearest match with the test classification compared to SVM and 
information computationally expensive 

[47] Naive Bayes(NB) NB is a basic algorithm to produce NB performs well for small datasets. 
great outcomes which classifies Performance degrades when dealing with 
straightforward presumptions with large datasets 
attributes restricted 

[48] Decision tree (DT) Classification and regression model. Performance of DT is not well for large 
Tf-then for classification. It’s equally datasets 
exhaustive and exclusive 

[49] Random forest In ML one of the finest algorithms for When the dataset size is small then it 


(RF) 


classification with high accuracy 


works well 


Classifier 
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8 
7 
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Figure 1. Various classification approaches 


4. CONCLUSION 

The literature review on meta-heuristic optimization for solving the wrapper based feature selection 
problem. The detailed description of the feature selection definitions, the search techniques, evaluation 
measures and the role of the classifier in feature selection are discussed. A detailed survey on the wrapper 
feature selection based on multi-objective metaheuristic algorithms is done. Multi-objective feature selection 
key components such as search mechanism, the number of objectives and the applications are presented. The 
efficiency of the multi-objective feature selection problem using the wrapper method and the SVM classifier 
is efficiency for dealing with high dimensional data instances. The performance is measured in terms of 
accuracy and the number of attributes. Hybridization approaches related to multi-objective feature selection 
are discussed. The research gap is identified and suggestions for future work to improve the research on WFS 
can be performed in binary feature selection and human related search algorithms for optimization can be 
studied in future. Further, the exploration of random search techniques, with SVM classifier and WFS model 
can be performed. 
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